ARROW-9369: [Python] Support conversion from python sequence to dictionary type#8008
ARROW-9369: [Python] Support conversion from python sequence to dictionary type#8008kszucs wants to merge 6 commits into
Conversation
| DICTIONARY_BINARY_LIKE(BINARY, BinaryType); | ||
| DICTIONARY_BINARY_LIKE(STRING, StringType); | ||
| // DICTIONARY_BINARY_LIKE(LARGE_BINARY, LargeBinaryType); | ||
| // DICTIONARY_BINARY_LIKE(LARGE_STRING, LargeStringType); |
There was a problem hiding this comment.
Is there a reason this isn't enabled? Is the PR unfinished?
There was a problem hiding this comment.
Yes, there are a couple of design decisions we need to make because of the following problems:
- the index key type is ignored since adaptive builder is used
- large binary/string types are not supported by the builder which I'm not sure whether is intentional or not
| Concrete class for dictionary-encoded scalars. | ||
| """ | ||
|
|
||
| def __init__(self, index, dictionary, type): |
There was a problem hiding this comment.
Maybe we can put this in some from_.. class method? (to keep it consistent with the other scalars that __init__ raises)
There was a problem hiding this comment.
(and can you also add some tests for this construction method?)
| Status AppendValue(PyObject* obj) override { | ||
| ARROW_ASSIGN_OR_RAISE(string_view_, ValueConverter<ValueType>::FromPython(obj)); | ||
| // DCHECK_GE(string_view_.size, 0); | ||
| RETURN_NOT_OK(this->typed_builder_->Append(string_view_.bytes, |
There was a problem hiding this comment.
Isn't there a TypedBuilder::Append(string_view)? If not, can you please add it?
| public: | ||
| Status AppendValue(PyObject* obj) override { | ||
| ARROW_ASSIGN_OR_RAISE(string_view_, ValueConverter<ValueType>::FromPython(obj)); | ||
| // DCHECK_GE(string_view_.size, 0); |
| Status AppendValue(PyObject* obj) override { | ||
| ARROW_ASSIGN_OR_RAISE( | ||
| string_view_, ValueConverter<FixedSizeBinaryType>::FromPython(obj, byte_width_)); | ||
| RETURN_NOT_OK(this->typed_builder_->Append(string_view_.bytes, |
There was a problem hiding this comment.
Same here: it would be good to have TypedBuilder::Append(string_view).
| } | ||
|
|
||
| protected: | ||
| int32_t byte_width_; |
| std::shared_ptr<DecimalType> decimal_type_; | ||
| }; | ||
|
|
||
| #define DICTIONARY_PRIMITIVE(TYPE_ENUM, TYPE_CLASS) \ |
| assert isinstance(a.type, pa.DictionaryType) | ||
| assert a.type.equals(typ) | ||
|
|
||
| expected_indices = pa.array([0, 1, 0, 0, 1, 2], type=pa.int8()) |
There was a problem hiding this comment.
Can you test with nulls at some point?
| def test_dictionary_from_strings(): | ||
| for value_type in [pa.binary(), pa.string()]: | ||
| typ = pa.dictionary(pa.int8(), value_type) | ||
| a = pa.array(["", "a", "bb", "a", "bb", "ccc"], type=typ) |
| with pytest.raises(pa.ArrowNotImplementedError): | ||
| pickle.loads(pickle.dumps(s)) | ||
| restored = pickle.loads(pickle.dumps(s)) | ||
| assert restored.equals(s) |
There was a problem hiding this comment.
This should be inside the loop.
(also, null values are not tested?)
|
We can close this in favor of #8088 |
No description provided.